strip_identifier can cause a panic with cjk identifiers #138
+16
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Calling
strip_identifierwith identifiers that contain cjk characters causes a Rust panic as shown belowUsing cjk characters in identifiers is not common, but I found examples where cjk characters are used as identifiers and the panic happened.
Javascript identifiers are not limited to ascii characters and can include unicode characters.
The current implementation stored only the start byte index of each character in
end_idxwhile iterating, and then sliced the string using an inclusive range&s[..=end_idx].For example, the string "한글", '한' occupies bytes 0–2 and '글' occupies bytes 3–5, but when processing '글',
i = 3is stored asend_idxand slicing with&s[..=3]breaks the UTF-8 character boundary.This results in a
byte index is not a char boundarypanic.To fix, the code now tracks the end position of each character instead of the start position by calculating
end_idx = i + c.len_utf8(), and uses an exclusive range&s[..end_idx]when slicing.This change covers not only CJK characters but also other non-ASCII Unicode identifiers.